Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Genome Biol ; 24(1): 275, 2023 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-38041098

RESUMEN

Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Benchmarking
2.
Artif Intell Med ; 134: 102418, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36462892

RESUMEN

The COVID-19 pandemic has been keeping asking urgent questions with respect to therapeutic options. Existing drugs that can be repurposed promise rapid implementation in practice because of their prior approval. Conceivably, there is still room for substantial improvement, because most advanced artificial intelligence techniques for screening drug repositories have not been exploited so far. We construct a comprehensive network by combining year-long curated drug-protein/protein-protein interaction data on the one hand, and most recent SARS-CoV-2 protein interaction data on the other hand. We learn the structure of the resulting encompassing molecular interaction network and predict missing links using variational graph autoencoders (VGAEs), as a most advanced deep learning technique that has not been explored so far. We focus on hitherto unknown links between drugs and human proteins that play key roles in the replication cycle of SARS-CoV-2. Thereby, we establish novel host-directed therapy (HDT) options whose utmost plausibility is confirmed by realistic simulations. As a consequence, many of the predicted links are likely to be crucial for the virus to thrive on the one hand, and can be targeted with existing drugs on the other hand.


Asunto(s)
COVID-19 , Humanos , SARS-CoV-2 , Inteligencia Artificial , Pandemias , Extremidad Superior
3.
Nat Commun ; 13(1): 6657, 2022 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-36333324

RESUMEN

Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat .


Asunto(s)
Algoritmos , Nanoporos , Análisis de Secuencia de ADN/métodos , Haplotipos , Análisis de Datos , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos
4.
Nucleic Acids Res ; 50(17): e101, 2022 09 23.
Artículo en Inglés | MEDLINE | ID: mdl-35776122

RESUMEN

Next-generation sequencing-based metagenomics has enabled to identify microorganisms in characteristic habitats without the need for lengthy cultivation. Importantly, clinically relevant phenomena such as resistance to medication, virulence or interactions with the environment can vary already within species. Therefore, a major current challenge is to reconstruct individual genomes from the sequencing reads at the level of strains, and not just the level of species. However, strains of one species can differ only by minor amounts of variants, which makes it difficult to distinguish them. Despite considerable recent progress, related approaches have remained fragmentary so far. Here, we present StrainXpress, as a comprehensive solution to the problem of strain aware metagenome assembly from next-generation sequencing reads. In experiments, StrainXpress reconstructs strain-specific genomes from metagenomes that involve up to >1000 strains and proves to successfully deal with poorly covered strains. The amount of reconstructed strain-specific sequence exceeds that of the current state-of-the-art approaches by on average 26.75% across all data sets (first quartile: 18.51%, median: 26.60%, third quartile: 35.05%).


Asunto(s)
Metagenoma , Metagenómica , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN
5.
Front Genet ; 13: 868280, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35646097

RESUMEN

Microbial communities are usually highly diverse and often involve multiple strains from the participating species due to the rapid evolution of microorganisms. In such a complex microecosystem, different strains may show different biological functions. While reconstruction of individual genomes at the strain level is vital for accurately deciphering the composition of microbial communities, the problem has largely remained unresolved so far. Next-generation sequencing has been routinely used in metagenome assembly but there have been struggles to generate strain-specific genome sequences due to the short-read length. This explains why long-read sequencing technologies have recently provided unprecedented opportunities to carry out haplotype- or strain-resolved genome assembly. Here, we propose MetaBooster and MetaBooster-HiFi, as two pipelines for strain-aware metagenome assembly from PacBio CLR and Oxford Nanopore long-read sequencing data. Benchmarking experiments on both simulated and real sequencing data demonstrate that either the MetaBooster or the MetaBooster-HiFi pipeline drastically outperforms the state-of-the-art de novo metagenome assemblers, in terms of all relevant metagenome assembly criteria, involving genome fraction, contig length, and error rates.

6.
Genome Biol ; 23(1): 29, 2022 01 20.
Artículo en Inglés | MEDLINE | ID: mdl-35057847

RESUMEN

Haplotype-resolved de novo assembly of highly diverse virus genomes is critical in prevention, control and treatment of viral diseases. Current methods either can handle only relatively accurate short read data, or collapse haplotype-specific variations into consensus sequence. Here, we present Strainline, a novel approach to assemble viral haplotypes from noisy long reads without a reference genome. Strainline is the first approach to provide strain-resolved, full-length de novo assemblies of viral quasispecies from noisy third-generation sequencing data. Benchmarking on simulated and real datasets of varying complexity and diversity confirm this novelty and demonstrate the superiority of Strainline.


Asunto(s)
Mapeo Contig/métodos , Genoma Viral , Haplotipos , SARS-CoV-2/genética , Programas Informáticos , Benchmarking , COVID-19/virología , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , SARS-CoV-2/clasificación , Análisis de Secuencia de ADN
7.
Nat Commun ; 12(1): 6744, 2021 11 18.
Artículo en Inglés | MEDLINE | ID: mdl-34795237

RESUMEN

Accurate single cell mutational profiles can reveal genomic cell-to-cell heterogeneity. However, sequencing libraries suitable for genotyping require whole genome amplification, which introduces allelic bias and copy errors. The resulting data violates assumptions of variant callers developed for bulk sequencing. Thus, only dedicated models accounting for amplification bias and errors can provide accurate calls. We present ProSolo for calling single nucleotide variants from multiple displacement amplified (MDA) single cell DNA sequencing data. ProSolo probabilistically models a single cell jointly with a bulk sequencing sample and integrates all relevant MDA biases in a site-specific and scalable-because computationally efficient-manner. This achieves a higher accuracy in calling and genotyping single nucleotide variants in single cells in comparison to state-of-the-art tools and supports imputation of insufficiently covered genotypes, when downstream tools cannot handle missing data. Moreover, ProSolo implements the first approach to control the false discovery rate reliably and flexibly. ProSolo is implemented in an extendable framework, with code and usage at: https://github.com/prosolo/prosolo.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de la Célula Individual/métodos , Programas Informáticos , Técnicas Genéticas , Genómica/métodos , Humanos , Mutación , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos
8.
Genome Biol ; 22(1): 299, 2021 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-34706745

RESUMEN

Haplotype-aware diploid genome assembly is crucial in genomics, precision medicine, and many other disciplines. Long-read sequencing technologies have greatly improved genome assembly. However, current long-read assemblers are either reference based, so introduce biases, or fail to capture the haplotype diversity of diploid genomes. We present phasebook, a de novo approach for reconstructing the haplotypes of diploid genomes from long reads. phasebook outperforms other approaches in terms of haplotype coverage by large margins, in addition to achieving competitive performance in terms of assembly errors and assembly contiguity.


Asunto(s)
Diploidia , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Genómica , Humanos , Secuenciación de Nanoporos
9.
NPJ Precis Oncol ; 5(1): 15, 2021 Mar 02.
Artículo en Inglés | MEDLINE | ID: mdl-33654267

RESUMEN

Cancer is generally characterized by acquired genomic aberrations in a broad spectrum of types and sizes, ranging from single nucleotide variants to structural variants (SVs). At least 30% of cancers have a known pathogenic SV used in diagnosis or treatment stratification. However, research into the role of SVs in cancer has been limited due to difficulties in detection. Biological and computational challenges confound SV detection in cancer samples, including intratumor heterogeneity, polyploidy, and distinguishing tumor-specific SVs from germline and somatic variants present in healthy cells. Classification of tumor-specific SVs is challenging due to inconsistencies in detected breakpoints, derived variant types and biological complexity of some rearrangements. Full-spectrum SV detection with high recall and precision requires integration of multiple algorithms and sequencing technologies to rescue variants that are difficult to resolve through individual methods. Here, we explore current strategies for integrating SV callsets and to enable the use of tumor-specific SVs in precision oncology.

10.
Sci Rep ; 11(1): 4541, 2021 02 25.
Artículo en Inglés | MEDLINE | ID: mdl-33633136

RESUMEN

Myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS) is a chronic disorder characterized by disabling fatigue. Several studies have sought to identify diagnostic biomarkers, with varying results. Here, we innovate this process by combining both mRNA expression and DNA methylation data. We performed recursive ensemble feature selection (REFS) on publicly available mRNA expression data in peripheral blood mononuclear cells (PBMCs) of 93 ME/CFS patients and 25 healthy controls, and found a signature of 23 genes capable of distinguishing cases and controls. REFS highly outperformed other methods, with an AUC of 0.92. We validated the results on a different platform (AUC of 0.95) and in DNA methylation data obtained from four public studies on ME/CFS (99 patients and 50 controls), identifying 48 gene-associated CpGs that predicted disease status as well (AUC of 0.97). Finally, ten of the 23 genes could be interpreted in the context of the derailed immune system of ME/CFS.


Asunto(s)
Síndrome de Fatiga Crónica/etiología , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Transcriptoma , Biomarcadores , Estudios de Casos y Controles , Biología Computacional/métodos , Metilación de ADN , Susceptibilidad a Enfermedades , Síndrome de Fatiga Crónica/diagnóstico , Modelos Biológicos , ARN Mensajero , Curva ROC , Reproducibilidad de los Resultados
11.
Bioinformatics ; 37(7): 905-912, 2021 05 17.
Artículo en Inglés | MEDLINE | ID: mdl-32871010

RESUMEN

MOTIVATION: The microbes that live in an environment can be identified from the combined genomic material, also referred to as the metagenome. Sequencing a metagenome can result in large volumes of sequencing reads. A promising approach to reduce the size of metagenomic datasets is by clustering reads into groups based on their overlaps. Clustering reads are valuable to facilitate downstream analyses, including computationally intensive strain-aware assembly. As current read clustering approaches cannot handle the large datasets arising from high-throughput metagenome sequencing, a novel read clustering approach is needed. In this article, we propose OGRE, an Overlap Graph-based Read clustEring procedure for high-throughput sequencing data, with a focus on shotgun metagenomes. RESULTS: We show that for small datasets OGRE outperforms other read binners in terms of the number of species included in a cluster, also referred to as cluster purity, and the fraction of all reads that is placed in one of the clusters. Furthermore, OGRE is able to process metagenomic datasets that are too large for other read binners into clusters with high cluster purity. CONCLUSION: OGRE is the only method that can successfully cluster reads in species-specific clusters for large metagenomic datasets without running into computation time- or memory issues. AVAILABILITYAND IMPLEMENTATION: Code is made available on Github (https://github.com/Marleen1/OGRE). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Metagenoma , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Secuenciación de Nucleótidos de Alto Rendimiento , Metagenómica , Análisis de Secuencia de ADN
12.
Cancers (Basel) ; 12(7)2020 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-32635415

RESUMEN

Circulating microRNAs (miRNA) are small noncoding RNA molecules that can be detected in bodily fluids without the need for major invasive procedures on patients. miRNAs have shown great promise as biomarkers for tumors to both assess their presence and to predict their type and subtype. Recently, thanks to the availability of miRNAs datasets, machine learning techniques have been successfully applied to tumor classification. The results, however, are difficult to assess and interpret by medical experts because the algorithms exploit information from thousands of miRNAs. In this work, we propose a novel technique that aims at reducing the necessary information to the smallest possible set of circulating miRNAs. The dimensionality reduction achieved reflects a very important first step in a potential, clinically actionable, circulating miRNA-based precision medicine pipeline. While it is currently under discussion whether this first step can be taken, we demonstrate here that it is possible to perform classification tasks by exploiting a recursive feature elimination procedure that integrates a heterogeneous ensemble of high-quality, state-of-the-art classifiers on circulating miRNAs. Heterogeneous ensembles can compensate inherent biases of classifiers by using different classification algorithms. Selecting features then further eliminates biases emerging from using data from different studies or batches, yielding more robust and reliable outcomes. The proposed approach is first tested on a tumor classification problem in order to separate 10 different types of cancer, with samples collected over 10 different clinical trials, and later is assessed on a cancer subtype classification task, with the aim to distinguish triple negative breast cancer from other subtypes of breast cancer. Overall, the presented methodology proves to be effective and compares favorably to other state-of-the-art feature selection methods.

14.
Bioinformatics ; 35(14): i538-i547, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510706

RESUMEN

MOTIVATION: Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease caused by aberrations in the genome. While several disease-causing variants have been identified, a major part of heritability remains unexplained. ALS is believed to have a complex genetic basis where non-additive combinations of variants constitute disease, which cannot be picked up using the linear models employed in classical genotype-phenotype association studies. Deep learning on the other hand is highly promising for identifying such complex relations. We therefore developed a deep-learning based approach for the classification of ALS patients versus healthy individuals from the Dutch cohort of the Project MinE dataset. Based on recent insight that regulatory regions harbor the majority of disease-associated variants, we employ a two-step approach: first promoter regions that are likely associated to ALS are identified, and second individuals are classified based on their genotype in the selected genomic regions. Both steps employ a deep convolutional neural network. The network architecture accounts for the structure of genome data by applying convolution only to parts of the data where this makes sense from a genomics perspective. RESULTS: Our approach identifies potentially ALS-associated promoter regions, and generally outperforms other classification methods. Test results support the hypothesis that non-additive combinations of variants contribute to ALS. Architectures and protocols developed are tailored toward processing population-scale, whole-genome data. We consider this a relevant first step toward deep learning assisted genotype-phenotype association in whole genome-sized data. AVAILABILITY AND IMPLEMENTATION: Our code will be available on Github, together with a synthetic dataset (https://github.com/byin-cwi/ALS-Deeplearning). The data used in this study is available to bona-fide researchers upon request. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Esclerosis Amiotrófica Lateral , Genoma , Redes Neurales de la Computación , Enfermedades Neurodegenerativas , Esclerosis Amiotrófica Lateral/genética , Genotipo , Humanos
15.
iScience ; 18: 20-27, 2019 Aug 30.
Artículo en Inglés | MEDLINE | ID: mdl-31352182

RESUMEN

The amount of genetic variation discovered in human populations is growing rapidly leading to challenging computational tasks, such as variant calling. Standard methods for addressing this problem include read mapping, a computationally expensive procedure; thus, mapping-free tools have been proposed in recent years. These tools focus on isolated, biallelic SNPs, providing limited support for multi-allelic SNPs and short insertions and deletions of nucleotides (indels). Here we introduce MALVA, a mapping-free method to genotype an individual from a sample of reads. MALVA is the first mapping-free tool able to genotype multi-allelic SNPs and indels, even in high-density genomic regions, and to effectively handle a huge number of variants. MALVA requires one order of magnitude less time to genotype a donor than alignment-based pipelines, providing similar accuracy. Remarkably, on indels, MALVA provides even better results than the most widely adopted variant discovery tools.

16.
Bioinformatics ; 35(24): 5086-5094, 2019 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-31147688

RESUMEN

MOTIVATION: Viruses populate their hosts as a viral quasispecies: a collection of genetically related mutant strains. Viral quasispecies assembly is the reconstruction of strain-specific haplotypes from read data, and predicting their relative abundances within the mix of strains is an important step for various treatment-related reasons. Reference genome independent ('de novo') approaches have yielded benefits over reference-guided approaches, because reference-induced biases can become overwhelming when dealing with divergent strains. While being very accurate, extant de novo methods only yield rather short contigs. The remaining challenge is to reconstruct full-length haplotypes together with their abundances from such contigs. RESULTS: We present Virus-VG as a de novo approach to viral haplotype reconstruction from preassembled contigs. Our method constructs a variation graph from the short input contigs without making use of a reference genome. Then, to obtain paths through the variation graph that reflect the original haplotypes, we solve a minimization problem that yields a selection of maximal-length paths that is, optimal in terms of being compatible with the read coverages computed for the nodes of the variation graph. We output the resulting selection of maximal length paths as the haplotypes, together with their abundances. Benchmarking experiments on challenging simulated and real datasets show significant improvements in assembly contiguity compared to the input contigs, while preserving low error rates compared to the state-of-the-art viral quasispecies assemblers. AVAILABILITY AND IMPLEMENTATION: Virus-VG is freely available at https://bitbucket.org/jbaaijens/virus-vg. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Cuasiespecies , Algoritmos , Genoma , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Programas Informáticos
17.
Bioinformatics ; 35(21): 4281-4289, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-30994902

RESUMEN

MOTIVATION: Haplotype-aware genome assembly plays an important role in genetics, medicine and various other disciplines, yet generation of haplotype-resolved de novo assemblies remains a major challenge. Beyond distinguishing between errors and true sequential variants, one needs to assign the true variants to the different genome copies. Recent work has pointed out that the enormous quantities of traditional NGS read data have been greatly underexploited in terms of haplotig computation so far, which reflects that methodology for reference independent haplotig computation has not yet reached maturity. RESULTS: We present POLYploid genome fitTEr (POLYTE) as a new approach to de novo generation of haplotigs for diploid and polyploid genomes of known ploidy. Our method follows an iterative scheme where in each iteration reads or contigs are joined, based on their interplay in terms of an underlying haplotype-aware overlap graph. Along the iterations, contigs grow while preserving their haplotype identity. Benchmarking experiments on both real and simulated data demonstrate that POLYTE establishes new standards in terms of error-free reconstruction of haplotype-specific sequence. As a consequence, POLYTE outperforms state-of-the-art approaches in various relevant aspects, where advantages become particularly distinct in polyploid settings. AVAILABILITY AND IMPLEMENTATION: POLYTE is freely available as part of the HaploConduct package at https://github.com/HaploConduct/HaploConduct, implemented in Python and C++. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Diploidia , Poliploidía , Algoritmos , Genoma , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN
18.
Viruses ; 10(5)2018 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-29757994

RESUMEN

The Second Annual Meeting of the European Virus Bioinformatics Center (EVBC), held in Utrecht, Netherlands, focused on computational approaches in virology, with topics including (but not limited to) virus discovery, diagnostics, (meta-)genomics, modeling, epidemiology, molecular structure, evolution, and viral ecology. The goals of the Second Annual Meeting were threefold: (i) to bring together virologists and bioinformaticians from across the academic, industrial, professional, and training sectors to share best practice; (ii) to provide a meaningful and interactive scientific environment to promote discussion and collaboration between students, postdoctoral fellows, and both new and established investigators; (iii) to inspire and suggest new research directions and questions. Approximately 120 researchers from around the world attended the Second Annual Meeting of the EVBC this year, including 15 renowned international speakers. This report presents an overview of new developments and novel research findings that emerged during the meeting.


Asunto(s)
Biología Computacional , Virología , Congresos como Asunto , Virus ADN , Ecología , Genómica , Humanos , Sociedades Científicas , Programas Informáticos
19.
Genome Res ; 27(5): 835-848, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28396522

RESUMEN

A viral quasispecies, the ensemble of viral strains populating an infected person, can be highly diverse. For optimal assessment of virulence, pathogenesis, and therapy selection, determining the haplotypes of the individual strains can play a key role. As many viruses are subject to high mutation and recombination rates, high-quality reference genomes are often not available at the time of a new disease outbreak. We present SAVAGE, a computational tool for reconstructing individual haplotypes of intra-host virus strains without the need for a high-quality reference genome. SAVAGE makes use of either FM-index-based data structures or ad hoc consensus reference sequence for constructing overlap graphs from patient sample data. In this overlap graph, nodes represent reads and/or contigs, while edges reflect that two reads/contigs, based on sound statistical considerations, represent identical haplotypic sequence. Following an iterative scheme, a new overlap assembly algorithm that is based on the enumeration of statistically well-calibrated groups of reads/contigs then efficiently reconstructs the individual haplotypes from this overlap graph. In benchmark experiments on simulated and on real deep-coverage data, SAVAGE drastically outperforms generic de novo assemblers as well as the only specialized de novo viral quasispecies assembler available so far. When run on ad hoc consensus reference sequence, SAVAGE performs very favorably in comparison with state-of-the-art reference genome-guided tools. We also apply SAVAGE on two deep-coverage samples of patients infected by the Zika and the hepatitis C virus, respectively, which sheds light on the genetic structures of the respective viral quasispecies.


Asunto(s)
Mapeo Contig/métodos , Genoma Viral , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Mapeo Contig/normas , Genómica/normas , Haplotipos , Hepacivirus/genética , Polimorfismo Genético , Estándares de Referencia , Análisis de Secuencia de ADN/normas , Virus Zika/genética
20.
Bioinformatics ; 33(24): 4015-4023, 2017 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-28169394

RESUMEN

MOTIVATION: Next Generation Sequencing (NGS) has enabled studying structural genomic variants (SVs) such as duplications and inversions in large cohorts. SVs have been shown to play important roles in multiple diseases, including cancer. As costs for NGS continue to decline and variant databases become ever more complete, the relevance of genotyping also SVs from NGS data increases steadily, which is in stark contrast to the lack of tools to do so. RESULTS: We introduce a novel statistical approach, called DIGTYPER (Duplication and Inversion GenoTYPER), which computes genotype likelihoods for a given inversion or duplication and reports the maximum likelihood genotype. In contrast to purely coverage-based approaches, DIGTYPER uses breakpoint-spanning read pairs as well as split alignments for genotyping, enabling typing also of small events. We tested our approach on simulated and on real data and compared the genotype predictions to those made by DELLY, which discovers SVs and computes genotypes, and SVTyper, a genotyping program used to genotype variants detected by LUMPY. DIGTYPER compares favorable especially for duplications (of all lengths) and for shorter inversions (up to 300 bp). In contrast to DELLY, our approach can genotype SVs from data bases without having to rediscover them. AVAILABILITY AND IMPLEMENTATION: https://bitbucket.org/jana_ebler/digtyper.git. CONTACT: t.marschall@mpi-inf.mpg.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Duplicación Cromosómica , Inversión Cromosómica , Variación Estructural del Genoma , Técnicas de Genotipaje/métodos , Bases de Datos de Ácidos Nucleicos , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia de ADN , Eliminación de Secuencia , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...